The aim of these exercises is to improve your ability to deal with multi-predictor linear models where the predictors are all continuous or a mixture of continuous and categorical.

A

Let’s continue with the example from Chapter 7 assessing whether dietary supplements improve the perceived health of dogs with osteoarthritis. The model we focused on at the end of that exercise was one modelling the pain index of dogs after 60 days as a function of whether they received dietary supplements or a placebo and the sex of the dog. The dogs unavoidably varied in body weight, ranging from 14-47 kg. To partly account for this, the authors adjusted the doses to a constant amount per kg of body weight. However, you can probably think of a range of ways in which weight might affect osteoarthritis. The model using treatment and sex fitted the data fairly well, with r2 around 0.6. We detected a strong treatment effect, but it is possible that if we reduced background noise, we might see sex-specific responses and we’d also get a more precise estimate of the effects.

Think about the steps you’d take to see if it would be helpful to include body weight in the model, then go back to the data and run the analysis.

Start by reloading the data file from Chapter 7 exercises

df <- read.csv("data/martello.csv")
head(df,10)
#Tidy up the names if original data used
#df_names<-c(GROUP="group", sterilizzato = "ster", BCS = "bcs", PESO.KG = "wt", ETA = "eta", RAZZA = "breed",
#                  HCPI = "hcp0", HCPI.4 = "hcp40", HCP.6 = "hcp60",
#                  SEGNI.OA.VET = "vet0", SEGNI.AO.VET.4 = "vet40", SEGNI.AO.VET.6 = "vet60")
#df<-rename(df, all_of(df_names))
# make sex, treatment, and sterilization factors
df$sex<-as.factor(df$sex)
df$group<-as.factor(df$group)
df$ster<-as.factor(df$ster)
#set contrasts to sum
contrasts=list(group=contr.sum, sex=contr.sum, ster=contr.sum)

The model we had in that exercise linked final owner-assessed health (hcp60) to treatment (group), sex, and possibly sterilization status (ster).

For simplicity, let’s use the treatment/sex model, and start by running it for comparison

martello.lm <- lm(hcp60~group*sex, data=df)
summary(martello.lm)

Call:
lm(formula = hcp60 ~ group * sex, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.636  -2.667  -0.455   3.167   9.364 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)      30.64       1.44   21.29  < 2e-16 ***
groupTRT         -9.97       2.15   -4.65  4.4e-05 ***
sexM             -3.75       2.15   -1.75    0.089 .  
groupTRT:sexM     1.54       3.03    0.51    0.616    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.77 on 36 degrees of freedom
Multiple R-squared:  0.548, Adjusted R-squared:  0.511 
F-statistic: 14.6 on 3 and 36 DF,  p-value: 2.25e-06
emmeans(martello.lm, ~ group:sex)
 group sex emmean   SE df lower.CL upper.CL
 CTR   F     30.6 1.44 36     27.7     33.6
 TRT   F     20.7 1.59 36     17.4     23.9
 CTR   M     26.9 1.59 36     23.7     30.1
 TRT   M     18.4 1.44 36     15.5     21.4

Confidence level used: 0.95 

Start by checking covariate ranges

boxplot(wt~group*sex, data=df)

Some slight differences, including sex-based ones, but ranges overlap, so probably OK

Now run model and look at residuals. Try model with 3 slopes terms first

martello2.lm <- lm(hcp60~group*sex*wt, data=df)
plot(martello2.lm)

summary(martello2.lm)

Call:
lm(formula = hcp60 ~ group * sex * wt, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-9.753 -3.056  0.016  3.248  9.118 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        40.631      8.520    4.77  3.9e-05 ***
groupTRT          -27.277     11.872   -2.30    0.028 *  
sexM              -13.625     11.140   -1.22    0.230    
wt                 -0.388      0.326   -1.19    0.242    
groupTRT:sexM      17.129     14.530    1.18    0.247    
groupTRT:wt         0.642      0.430    1.49    0.146    
sexM:wt             0.385      0.382    1.01    0.321    
groupTRT:sexM:wt   -0.584      0.494   -1.18    0.246    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.88 on 32 degrees of freedom
Multiple R-squared:  0.58,  Adjusted R-squared:  0.488 
F-statistic: 6.31 on 7 and 32 DF,  p-value: 0.000107
Anova(martello2.lm, type='III')
Anova Table (Type III tests)

Response: hcp60
             Sum Sq Df F value  Pr(>F)    
(Intercept)     542  1   22.74 3.9e-05 ***
group           126  1    5.28   0.028 *  
sex              36  1    1.50   0.230    
wt               34  1    1.42   0.242    
group:sex        33  1    1.39   0.247    
group:wt         53  1    2.22   0.146    
sex:wt           24  1    1.01   0.321    
group:sex:wt     33  1    1.39   0.246    
Residuals       763 32                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nothing here to suggest that slopes are differnt across the categorical predictors, so lets run an intercepts model

martello3.lm <- lm(hcp60~group*sex+wt, data=df)
plot(martello3.lm)

summary(martello3.lm)

Call:
lm(formula = hcp60 ~ group * sex + wt, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.689  -2.743  -0.366   3.019   9.544 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    30.0358     2.9452   10.20  5.1e-12 ***
groupTRT      -10.0435     2.1967   -4.57  5.8e-05 ***
sexM           -3.9639     2.3616   -1.68     0.10    
wt              0.0233     0.0995    0.23     0.82    
groupTRT:sexM   1.7428     3.1992    0.54     0.59    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.84 on 35 degrees of freedom
Multiple R-squared:  0.549, Adjusted R-squared:  0.498 
F-statistic: 10.7 on 4 and 35 DF,  p-value: 9.35e-06
Anova(martello3.lm, type='III')
Anova Table (Type III tests)

Response: hcp60
            Sum Sq Df F value  Pr(>F)    
(Intercept)   2433  1  104.00 5.1e-12 ***
group          489  1   20.90 5.8e-05 ***
sex             66  1    2.82    0.10    
wt               1  1    0.06    0.82    
group:sex        7  1    0.30    0.59    
Residuals      819 35                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
emmeans(martello3.lm, ~group|sex)
sex = F:
 group emmean   SE df lower.CL upper.CL
 CTR     30.7 1.51 35     27.7     33.8
 TRT     20.7 1.61 35     17.4     24.0

sex = M:
 group emmean   SE df lower.CL upper.CL
 CTR     26.8 1.70 35     23.3     30.2
 TRT     18.5 1.46 35     15.5     21.4

Confidence level used: 0.95 

Body weight plays very little role here. Adding it to the model only raises the variance explained by 0.1%, and not surprisingly, the estimated effects haven’t changed much either.

B

LaRosa and Connor (Amer. J. Bot., 2017) examined effects of several floral traits on fitness components of milkweeds, Asclepias spp. The fitness components were male and female pollination success and female reproductive success.

In the paper, their analysis focused on 6 predictors, They measured six floral traits, although one of them, hood height, was not relevant for Asclepias viridiflora, which was the species with the largest sample size. - gynostegium width, - hood length, - hood height, - horn reach, - slit length, and - gap width

Their Figure 2 shows what these measurements represent on flowers.

The data are available from Dryad here.

The floral traits were different between species, so data analysis would require each species to be analyzed separately or for the measurements to be standardized within each species.

df <- read.csv("data/larosa.csv")
head(df,10)
df_syr<-subset(df,species=="Asyr")
df_vir<-subset(df, species=='Avir')

Fitness component estimates were relativized by dividing by the mean, and the traits were standardized to a mean of zero and standard deviation of one.

Start by looking at A. syriaca, then for comparison, look at how these floral traits affect A. viridiflora

First look at the removals per flower

What checks should you do before assessing the predictors’ effects?

scatterplotMatrix(data=df_syr,~ removals.per.flower + gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)

cor(df_syr[,c('gyn.width', 'hood.length', 'hood.height', 'horn.reach', 'slit.length', 'gap.width')])
            gyn.width hood.length hood.height horn.reach slit.length gap.width
gyn.width      1.0000       0.299       0.280      0.228     0.38647   0.04232
hood.length    0.2985       1.000       0.418      0.626     0.26245   0.15720
hood.height    0.2800       0.418       1.000      0.610     0.33986   0.25232
horn.reach     0.2282       0.626       0.610      1.000     0.34591   0.25328
slit.length    0.3865       0.262       0.340      0.346     1.00000   0.00769
gap.width      0.0423       0.157       0.252      0.253     0.00769   1.00000
vif(lm(data=df_syr, removals.per.flower ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width))
  gyn.width hood.length hood.height  horn.reach slit.length   gap.width 
       1.26        1.72        1.71        2.26        1.31        1.10 

Diagnostics OK

syr1.lm<-lm(data=df_syr, removals.per.flower ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)
plot(syr1.lm)

summary(syr1.lm)

Call:
lm(formula = removals.per.flower ~ gyn.width + hood.length + 
    hood.height + horn.reach + slit.length + gap.width, data = df_syr)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0796 -0.4980  0.0352  0.7252  2.1521 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)     2.01       3.54    0.57   0.5730   
gyn.width     -18.93      15.07   -1.26   0.2166   
hood.length    12.90       4.96    2.60   0.0131 * 
hood.height    13.83       4.76    2.90   0.0061 **
horn.reach    -10.92       7.09   -1.54   0.1315   
slit.length   -16.48      15.67   -1.05   0.2995   
gap.width     -28.02      14.19   -1.97   0.0556 . 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.06 on 38 degrees of freedom
Multiple R-squared:  0.339, Adjusted R-squared:  0.234 
F-statistic: 3.24 on 6 and 38 DF,  p-value: 0.0113

If you’re happy with your pre-flight checks, fit the model and make some conclusions about the effects of each predictor, including any notes of caution

Look at results of model fitting

options(digits=3)
tidy(syr1.lm, conf.int = TRUE)
glance(syr1.lm)

Get standardized coefficients with lmbeta

lm.beta.syr1 <- lm.beta(syr1.lm)
tidy(lm.beta.syr1, conf.int = TRUE)

Run through same sequence for the other two life-history traits

First do insertions.per.flower

scatterplotMatrix(data=df_syr,~ insertions.per.flower + gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)


syr2.lm<-lm(data=df_syr, insertions.per.flower ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)
plot(syr2.lm)

summary(syr2.lm)

Call:
lm(formula = insertions.per.flower ~ gyn.width + hood.length + 
    hood.height + horn.reach + slit.length + gap.width, data = df_syr)

Residuals:
   Min     1Q Median     3Q    Max 
-0.417 -0.173 -0.009  0.122  0.571 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.0181     0.7873    0.02     0.98
gyn.width    -1.1430     3.3512   -0.34     0.73
hood.length   1.7626     1.1026    1.60     0.12
hood.height   1.7360     1.0592    1.64     0.11
horn.reach   -2.5932     1.5759   -1.65     0.11
slit.length  -0.4547     3.4844   -0.13     0.90
gap.width    -3.0716     3.1561   -0.97     0.34

Residual standard error: 0.235 on 38 degrees of freedom
Multiple R-squared:  0.138, Adjusted R-squared:  0.00134 
F-statistic: 1.01 on 6 and 38 DF,  p-value: 0.433
options(digits=3)
tidy(syr2.lm, conf.int = TRUE)
glance(syr2.lm)

lm.beta.syr2 <- lm.beta(syr2.lm)
tidy(lm.beta.syr2, conf.int = TRUE)

Model explains less of variation; overall rsq much lower. Same two predictors have highest standardized coefficients, but largely noise here

Now do numbers of fruits

scatterplotMatrix(data=df_syr,~ fruits + gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)


syr3.lm<-lm(data=df_syr, fruits ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)
plot(syr3.lm)

summary(syr3.lm)

Call:
lm(formula = fruits ~ gyn.width + hood.length + hood.height + 
    horn.reach + slit.length + gap.width, data = df_syr)

Residuals:
   Min     1Q Median     3Q    Max 
-9.142 -3.143 -0.369  2.709 15.094 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -30.09      17.79   -1.69    0.100 .
gyn.width     102.33      75.42    1.36    0.184  
hood.length    -9.25      25.11   -0.37    0.715  
hood.height    11.39      23.68    0.48    0.633  
horn.reach     85.09      39.04    2.18    0.036 *
slit.length    -1.08      77.23   -0.01    0.989  
gap.width    -107.72      72.86   -1.48    0.148  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.11 on 35 degrees of freedom
  (3 observations deleted due to missingness)
Multiple R-squared:  0.292, Adjusted R-squared:  0.17 
F-statistic:  2.4 on 6 and 35 DF,  p-value: 0.0475
options(digits=3)
tidy(syr3.lm, conf.int = TRUE)
glance(syr3.lm)

lm.beta.syr3 <- lm.beta(syr3.lm)
tidy(lm.beta.syr3, conf.int = TRUE)

Still lots of unexplained variation. Only one predictor has much influence (horn reach)

What would you need to check in doing analyses on three different fitness components as response variables?

Check that they aren’t correlated with each other.

cor(df_syr[,c('removals.per.flower', 'insertions.per.flower', 'fruits')])
                      removals.per.flower insertions.per.flower fruits
removals.per.flower                 1.000                 0.499     NA
insertions.per.flower               0.499                 1.000     NA
fruits                                 NA                    NA      1

OK; one r=0.5

What do you conclude about the floral traits’ influence on fitness components of this species?

Floral traits affect two of the three components - hood parameters are positively related to pollen removal rates, and horn reach is linked to higher fruit numbers.

Now have a look at the data for the more common species Asclepias viridiflora*

Remember that one floral trait, hood height, isn’t relevant for flowers of this species

Just show big code block here.

#Run diagnostics
scatterplotMatrix(data=df_vir,~ removals.per.flower + gyn.width + hood.length + hood.height + slit.length + gap.width)

cor(df_vir[,c('gyn.width', 'hood.length', 'hood.height', 'slit.length', 'gap.width')])
            gyn.width hood.length hood.height slit.length gap.width
gyn.width       1.000      0.1286      0.1672     0.24030   0.25034
hood.length     0.129      1.0000      0.1527     0.08002  -0.04063
hood.height     0.167      0.1527      1.0000     0.22310  -0.04564
slit.length     0.240      0.0800      0.2231     1.00000   0.00707
gap.width       0.250     -0.0406     -0.0456     0.00707   1.00000
vif(lm(data=df_vir, removals.per.flower ~ gyn.width + hood.length + hood.height + slit.length + gap.width))
  gyn.width hood.length hood.height slit.length   gap.width 
       1.17        1.04        1.09        1.10        1.08 
vir1.lm<-lm(data=df_vir, removals.per.flower ~ gyn.width + hood.length + hood.height + slit.length + gap.width)
plot(vir1.lm)

summary(vir1.lm)

Call:
lm(formula = removals.per.flower ~ gyn.width + hood.length + 
    hood.height + slit.length + gap.width, data = df_vir)

Residuals:
   Min     1Q Median     3Q    Max 
-2.034 -0.463  0.041  0.526  1.549 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    1.510      1.427    1.06     0.29  
gyn.width     -0.773      4.754   -0.16     0.87  
hood.length    3.647      5.684    0.64     0.52  
hood.height    3.613      1.750    2.06     0.04 *
slit.length   -2.939      6.213   -0.47     0.64  
gap.width     -6.765      6.462   -1.05     0.30  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.747 on 206 degrees of freedom
Multiple R-squared:  0.0315,    Adjusted R-squared:  0.00803 
F-statistic: 1.34 on 5 and 206 DF,  p-value: 0.248
#Look at results of model fitting

options(digits=3)
tidy(vir1.lm, conf.int = TRUE)
glance(vir1.lm)

#Get standardized coefficients with lmbeta

lm.beta.vir1 <- lm.beta(vir1.lm)
tidy(lm.beta.vir1, conf.int = TRUE)

#Run through same sequence for the other two life-history traits

scatterplotMatrix(data=df_vir,~ insertions.per.flower + gyn.width + hood.length + hood.height + slit.length + gap.width)


vir2.lm<-lm(data=df_vir, insertions.per.flower ~ gyn.width + hood.length + hood.height + slit.length + gap.width)
plot(vir2.lm)

summary(vir2.lm)

Call:
lm(formula = insertions.per.flower ~ gyn.width + hood.length + 
    hood.height + slit.length + gap.width, data = df_vir)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.2910 -0.1235 -0.0343  0.1059  0.4594 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   0.0259     0.3269    0.08    0.937  
gyn.width    -0.4290     1.0889   -0.39    0.694  
hood.length  -0.0425     1.3019   -0.03    0.974  
hood.height   0.2837     0.4007    0.71    0.480  
slit.length   2.8794     1.4231    2.02    0.044 *
gap.width    -2.3061     1.4802   -1.56    0.121  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.171 on 206 degrees of freedom
Multiple R-squared:  0.0393,    Adjusted R-squared:  0.016 
F-statistic: 1.68 on 5 and 206 DF,  p-value: 0.14
options(digits=3)
tidy(vir2.lm, conf.int = TRUE)
glance(vir2.lm)

lm.beta.vir2 <- lm.beta(vir2.lm)
tidy(lm.beta.vir2, conf.int = TRUE)


scatterplotMatrix(data=df_vir,~ fruits + gyn.width + hood.length + hood.height + slit.length + gap.width)


vir3.lm<-lm(data=df_vir, fruits ~ gyn.width + hood.length + hood.height + slit.length + gap.width)
plot(vir3.lm)

summary(vir3.lm)

Call:
lm(formula = fruits ~ gyn.width + hood.length + hood.height + 
    slit.length + gap.width, data = df_vir)

Residuals:
   Min     1Q Median     3Q    Max 
-3.864 -1.570 -0.405  0.974 14.857 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -3.46       4.73   -0.73  0.46489    
gyn.width      15.72      15.90    0.99  0.32412    
hood.length    64.00      18.85    3.40  0.00083 ***
hood.height    -2.04       6.02   -0.34  0.73444    
slit.length    10.27      20.78    0.49  0.62171    
gap.width      -7.30      21.49   -0.34  0.73432    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.46 on 199 degrees of freedom
  (7 observations deleted due to missingness)
Multiple R-squared:  0.0681,    Adjusted R-squared:  0.0447 
F-statistic: 2.91 on 5 and 199 DF,  p-value: 0.0147
options(digits=3)
tidy(vir3.lm, conf.int = TRUE)
glance(vir3.lm)

lm.beta.vir3 <- lm.beta(vir3.lm)
tidy(lm.beta.vir3, conf.int = TRUE)

Hood height effect detected for removals per flower, but model rsq<1%, and standardized estimate not particularly large Slit length detected for insertions per flower, but rsq again very low, 1.6%, and standardized estimate low Fruits also with poor model fit, rsq <5%; highly “significant” effect of hood length detected. Standardized estimates low, 0.24 - strongest for this species.

What do you conclude about the role of floral traits in these two species?

Different floral traits matter - the best predictor of each fitness component is different between the two species.

Is there anything you’d be cautious about in making this comparison?

Different detection abilities for the two species, because one is commoner than the other, and easier to get a large sample.

Watch out for students using statistical signficance only to guide conclusions

Different sensitivies probably mean that two of the 3 effects for A. viridiflora wouldn’t have been detected with the A. syriaca sample size

Standardized coefficients suggest weaker overall effects in A. viridiflora

C

Recall the sengi example from Chapter 5 (or go back and look at it ;-)). The research questions are about family differences. Could also look at this relationship between sengis and the rest. There are 2 or 3 groups of insectivores, sengis and closely (afrotherian) and more distantly (laurasiatherian) species, and the research question is about where sengis fit. We can frame this as 2 or 3 questions.

  1. Does the new species (udzugwensis) fit within the pattern of other sengi?

  2. Are sengi different from the other small insectivores in their brain size?

    1. sengi vs all others, or

    2. sengi vs closely-related vs distantly related insectivores

Get started by loading the kaufman data. In Chapter 5, you should have decided that log-transforming both variables was sensible, so lets also start by defining new variables logbrain and log body. That will make the coding tidier, without having to log things repeatedly.

df <- read.csv("data/kaufman.csv")
df$logbrain <- log(df$brainmass)
df$logbody <- log(df$bodymass)

For the first question, cast your mind back to Chapter 6. How would you decide whether the new species is unusual?

Use the existing species to calculate a linear regression, then calculate the predicted value for a species of the body mass of R. udzugwensis.

sengi <- filter(df, family == "Macroscelididae")
sengi
sengi.lm <- lm(data=filter(sengi, species != "udzugwensis"), logbrain ~ logbody)
glance(sengi.lm)
tidy(sengi.lm)

# Now predict values for *R. udzu..*

predict(sengi.lm, data.frame(logbody = c(6.565)), interval="prediction", se=T)
$fit
   fit lwr  upr
1 8.91 8.5 9.32

$se.fit
[1] 0.0605

$df
[1] 2

$residual.scale
[1] 0.0723

Predicted log brain mass is 8.91, and the prediction interval is 8.50 - 9.31, so this species is pretty much smack on the line.

Now lets compare sengis to the other insectivores. Use three groups for comparison (sengi, Afrotherian and Laurasiatherian). These groups are defined in the variable relation

** You could make this comparison in two ways:

  • fit a linear model including the groups and log body mass, i.e. an analysis of covariance
  • look at the patterns in the residuals for the relationship between log-brain and log-body

**Before you start, are there any things to check in the original data, linked to the assumptions of the linear model you’ll fit?

The other important thing to note is that if we’re looking to compare sengis and other groups formally, we need to be careful about the ranges of body sizes. Sengi species start around the middle of the range, from 45g up. We’d probably restrict our comparison to species around this size - exclude those with body mass less than a threshold. Threshold would be arbitrary, but most people would choose 40 or 45g

ggplot(data=df, aes(x=log(bodymass), y=log(brainmass), colour = relation, shape = relation))+
  geom_point()+ theme_light() + scale_color_uchicago()

Analysis of covariance

Outline the steps you’ll take

  1. Fit complex model allowing slopes to vary among groups
  2. Assess whether the groups*logbody term should remain in the model
  3. If relationships can be treated as parallel, run simpler model with groups and logbody
  4. Examine results and decide whether to do any comparisons among groups
df$relation<-factor(df$relation)
df$relation2<-factor(df$relation2)
contrasts = list (relation = contr.sum)
# Create subset of file. We could keep running the filter each time we run the model, but doing it once is tidier.
df2 <- filter(df, bodymass > 40)
lm2<-aov(logbrain~logbody + relation + logbody*relation, data = df2)
glance(lm2)
tidy(lm2)
Anova(lm2, type='III')
Anova Table (Type III tests)

Response: logbrain
                 Sum Sq Df F value  Pr(>F)    
(Intercept)        9.49  1  126.81 7.8e-11 ***
logbody            3.41  1   45.58 6.9e-07 ***
relation           0.05  2    0.30    0.74    
logbody:relation   0.07  2    0.45    0.64    
Residuals          1.72 23                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

logbody*relation term contributing little to model fit, so run simpler model

lm3<-aov(logbrain~logbody + relation, data = df2)
glance(lm3)
tidy(lm3)
Anova(lm3, type='III')
Anova Table (Type III tests)

Response: logbrain
            Sum Sq Df F value  Pr(>F)    
(Intercept)  23.29  1     326 7.6e-16 ***
logbody      12.99  1     182 5.8e-13 ***
relation      1.86  2      13 0.00013 ***
Residuals     1.79 25                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Get adjusted means
library(effects)
adjmeans <- effect("relation", lm3, se=TRUE)
summary(adjmeans)

 relation effect
relation
    afrotherian laurasiatherian           sengi 
           7.17            7.34            7.91 

 Lower 95 Percent Confidence Limits
relation
    afrotherian laurasiatherian           sengi 
           7.01            7.18            7.66 

 Upper 95 Percent Confidence Limits
relation
    afrotherian laurasiatherian           sengi 
           7.34            7.50            8.16 
ggplot(data=df2, aes(x=log(bodymass), y=log(brainmass), group= relation, colour = relation, shape = relation))+
  geom_point()+ geom_smooth(method=lm) + theme_light() + scale_color_uchicago()

No need to proceed further. We conclude that slopes differ between the groups, and sengi are clearly above the other two. The other two groups are close, with abutting confidence intervals around the adjusted means.

Use residuals from a regression of all data and compare residuals between groups
lm4 <- lm(log(brainmass) ~ log(bodymass), data=df2)
df3 <- cbind(df2, lm4$residuals)
head(df3,10)
boxplot(lm4$residuals ~ relation, data = df3) 

lm5 <- aov (lm4$residuals ~ relation, data = df3)
summary(lm5)
            Df Sum Sq Mean Sq F value  Pr(>F)    
relation     2   1.80   0.901    12.6 0.00015 ***
Residuals   26   1.85   0.071                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Pattern also clear here, just from box plots

D

We’ll return to the elephant seal example in the study by LaBoeuf et al., and see whether body weight plays any role in foraging. In Chapter 5, you should have noticed that while the focus of the initial analysis was on the relationship between distance travelled and time spend on the foraging grounds, the authors recorded weight on departure for each animal. Your exploratory data analysis should have shown a relationship between body weight and the original predictor and response variables. Now try and make some sense of what’s going on here.

#Get the data file back
df <- read.csv("data/leboeuf.csv")
head(df,10)

Think about how body mass might influence distance travelled and how it might contribute to time on foraging areas

Mass could easily be linked to swimming speed, allowing larger animals to forage for longer. We could even get more complicated and speculate whether larger males may spend more time maintaining dominance on the shore, so might forage less. In either case, the two variables could be linked.

Body mass might also reflect overall condition, and, for example, males in poorer condidtion might need to spend longer feeding.

How will you assess whether including body weight as a second predictor helps us understand feeding time better?

Time on foraging = β0 + β1 Distance travelled + β2 Body weight

Need to check collinearity - use VIF

Check residuals

Check linearity

cor(df[,c('distance','departwt')])
         distance departwt
distance        1       NA
departwt       NA        1
laboeuf1.lm <- lm (FFAduration~ distance + departwt, data = df)
vif(laboeuf1.lm)
distance departwt 
    4.34     4.34 
plot(laboeuf1.lm)

VIF value not small, but below the common threshold of 5

Nothing dramatic in residuals, though points 13, 14, and 16 have large residuals

augment(laboeuf1.lm)

Fit the appropriate model to the data, interpret the results, and explain whether body weight helps us.

Get the equation

\[ \operatorname{FFAduration} = \alpha + \beta_{1}(\operatorname{distance}) + \beta_{2}(\operatorname{departwt}) + \epsilon \]

glance(laboeuf1.lm)
tidy(laboeuf1.lm)

Get standarised regression coefficients

Use lm.beta function from lm.beta package

lm.beta.laboeuf <- lm.beta(laboeuf1.lm)
tidy(lm.beta.laboeuf, conf.int = TRUE)
glance(lm.beta.laboeuf)

Show the partial regression (added-variable) plots

avPlots(laboeuf1.lm, ask=F)

Conclusion is that there’s a strong statistical relationship. The model explains around 60% of variation, and that effect is largely associated with distance travelled. Time on foraging grounds falls sharply with distance.

Departure rate plays little role.

Is there anything else you might look at?

The relationship of distance to both response and the other predictor might make it worth looking for a non-additive model

laboeuf2.lm <- lm (FFAduration~ distance + departwt + distance*departwt, data = df)
vif(laboeuf2.lm)
there are higher-order terms (interactions) in this model
consider setting type = 'predictor'; see ?vif
         distance          departwt distance:departwt 
             91.6              41.6             225.4 
# High vif values, as expected, so centre predictors and run again
df$cdistance <- scale(df$distance, center=TRUE, scale=FALSE)
df$cdepartwt <- scale(df$departwt, center=TRUE, scale=FALSE)
laboeuf3.lm <- lm (FFAduration~ cdistance + cdepartwt + cdistance*cdepartwt, data = df)
vif(laboeuf3.lm)
there are higher-order terms (interactions) in this model
consider setting type = 'predictor'; see ?vif
          cdistance           cdepartwt cdistance:cdepartwt 
               7.89                8.43                1.95 
lm.beta.laboeuf3 <- lm.beta(laboeuf3.lm)
tidy(lm.beta.laboeuf3, conf.int = TRUE)
glance(lm.beta.laboeuf3)
NA

Actually a worse model; more issues with VIF, adjusted r-sq lower, and combined term contributes nothing

---
title: "Ch 8 exercises"
output:
  html_notebook:
    theme: flatly
---

The aim of these exercises is to improve your ability to deal with multi-predictor linear models where the predictors are all continuous or a mixture of continuous and categorical.

```{r echo=FALSE, results='hide'}
source("R/libraries.R")
library(lm.beta)
```

### A

Let's continue with the example from Chapter 7 assessing whether dietary supplements improve the perceived health of dogs with osteoarthritis. The model we focused on at the end of that exercise was one modelling the pain index of dogs after 60 days as a function of whether they received dietary supplements or a placebo and the sex of the dog. The dogs unavoidably varied in body weight, ranging from 14-47 kg. To partly account for this, the authors adjusted the doses to a constant amount per kg of body weight. However, you can probably think of a range of ways in which weight might affect osteoarthritis. The model using treatment and sex fitted the data fairly well, with r^2^ around 0.6. We detected a strong treatment effect, but it is possible that if we reduced background noise, we might see sex-specific responses and we'd also get a more precise estimate of the effects.

Think about the steps you'd take to see if it would be helpful to include body weight in the model, then go back to the data and run the analysis.

> Start by reloading the data file from Chapter 7 exercises

```{r}
df <- read.csv("data/martello.csv")
head(df,10)
#Tidy up the names if original data used
#df_names<-c(GROUP="group", sterilizzato = "ster", BCS = "bcs", PESO.KG = "wt", ETA = "eta", RAZZA = "breed",
#                  HCPI = "hcp0", HCPI.4 = "hcp40", HCP.6 = "hcp60",
#                  SEGNI.OA.VET = "vet0", SEGNI.AO.VET.4 = "vet40", SEGNI.AO.VET.6 = "vet60")
#df<-rename(df, all_of(df_names))
# make sex, treatment, and sterilization factors
df$sex<-as.factor(df$sex)
df$group<-as.factor(df$group)
df$ster<-as.factor(df$ster)
#set contrasts to sum
contrasts=list(group=contr.sum, sex=contr.sum, ster=contr.sum)
```

> The model we had in that exercise linked final owner-assessed health (hcp60) to treatment (group), sex, and possibly sterilization status (ster).

> For simplicity, let's use the treatment/sex model, and start by running it for comparison

```{r}
martello.lm <- lm(hcp60~group*sex, data=df)
summary(martello.lm)
emmeans(martello.lm, ~ group:sex)
```

> Start by checking covariate ranges

```{r}
boxplot(wt~group*sex, data=df)
```

> Some slight differences, including sex-based ones, but ranges overlap, so probably OK

> Now run model and look at residuals. Try model with 3 slopes terms first

```{r}
martello2.lm <- lm(hcp60~group*sex*wt, data=df)
plot(martello2.lm)
summary(martello2.lm)
Anova(martello2.lm, type='III')
```

> Nothing here to suggest that slopes are differnt across the categorical predictors, so lets run an intercepts model

```{r}
martello3.lm <- lm(hcp60~group*sex+wt, data=df)
plot(martello3.lm)
summary(martello3.lm)
Anova(martello3.lm, type='III')
emmeans(martello3.lm, ~group|sex)
```

> Body weight plays very little role here. Adding it to the model only raises the variance explained by 0.1%, and not surprisingly, the estimated effects haven't changed much either.

### B

[LaRosa and Connor (Amer. J. Bot., 2017)](https://doi.org/10.3732/ajb.1600328) examined effects of several floral traits on fitness components of milkweeds, *Asclepias* spp. The fitness components were male and female pollination success and female reproductive success.

In the paper, their analysis focused on 6 predictors, They measured six floral traits, although one of them, hood height, was not relevant for **Asclepias viridiflora**, which was the species with the largest sample size. - gynostegium width, - hood length, - hood height, - horn reach, - slit length, and - gap width

Their Figure 2 shows what these measurements represent on flowers.

The data are available from Dryad [here](http://doi.org/10.5061/dryad.n81g1).

The floral traits were different between species, so data analysis would require each species to be analyzed separately or for the measurements to be standardized within each species.

```{r}
df <- read.csv("data/larosa.csv")
head(df,10)
df_syr<-subset(df,species=="Asyr")
df_vir<-subset(df, species=='Avir')

```

Fitness component estimates were relativized by dividing by the mean, and the traits were standardized to a mean of zero and standard deviation of one.

#### Start by looking at **A. syriaca**, then for comparison, look at how these floral traits affect **A. viridiflora**

*First look at the removals per flower*

*What checks should you do before assessing the predictors' effects?*

```{r}
scatterplotMatrix(data=df_syr,~ removals.per.flower + gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)
cor(df_syr[,c('gyn.width', 'hood.length', 'hood.height', 'horn.reach', 'slit.length', 'gap.width')])
vif(lm(data=df_syr, removals.per.flower ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width))
```

> Diagnostics OK

```{r}
syr1.lm<-lm(data=df_syr, removals.per.flower ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)
plot(syr1.lm)
summary(syr1.lm)
```

*If you're happy with your pre-flight checks, fit the model and make some conclusions about the effects of each predictor, including any notes of caution*

> Look at results of model fitting

```{r}
options(digits=3)
tidy(syr1.lm, conf.int = TRUE)
glance(syr1.lm)
```

> Get standardized coefficients with lmbeta

```{r}
lm.beta.syr1 <- lm.beta(syr1.lm)
tidy(lm.beta.syr1, conf.int = TRUE)
```

#### Run through same sequence for the other two life-history traits

> First do insertions.per.flower

```{r}
scatterplotMatrix(data=df_syr,~ insertions.per.flower + gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)

syr2.lm<-lm(data=df_syr, insertions.per.flower ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)
plot(syr2.lm)
summary(syr2.lm)

options(digits=3)
tidy(syr2.lm, conf.int = TRUE)
glance(syr2.lm)

lm.beta.syr2 <- lm.beta(syr2.lm)
tidy(lm.beta.syr2, conf.int = TRUE)
```

> Model explains less of variation; overall rsq much lower. Same two predictors have highest standardized coefficients, but largely noise here

> Now do numbers of fruits

```{r}
scatterplotMatrix(data=df_syr,~ fruits + gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)

syr3.lm<-lm(data=df_syr, fruits ~ gyn.width + hood.length + hood.height + horn.reach + slit.length + gap.width)
plot(syr3.lm)
summary(syr3.lm)

options(digits=3)
tidy(syr3.lm, conf.int = TRUE)
glance(syr3.lm)

lm.beta.syr3 <- lm.beta(syr3.lm)
tidy(lm.beta.syr3, conf.int = TRUE)
```

> Still lots of unexplained variation. Only one predictor has much influence (horn reach)

*What would you need to check in doing analyses on three different fitness components as response variables?*

> Check that they aren't correlated with each other.

```{r}
cor(df_syr[,c('removals.per.flower', 'insertions.per.flower', 'fruits')])
```

> OK; one r=0.5

*What do you conclude about the floral traits' influence on fitness components of this species?*

> Floral traits affect two of the three components - hood parameters are positively related to pollen removal rates, and horn reach is linked to higher fruit numbers.

#### Now have a look at the data for the more common species Asclepias viridiflora\*

Remember that one floral trait, hood height, isn't relevant for flowers of this species

> Just show big code block here.

```{r}
#Run diagnostics
scatterplotMatrix(data=df_vir,~ removals.per.flower + gyn.width + hood.length + hood.height + slit.length + gap.width)
cor(df_vir[,c('gyn.width', 'hood.length', 'hood.height', 'slit.length', 'gap.width')])
vif(lm(data=df_vir, removals.per.flower ~ gyn.width + hood.length + hood.height + slit.length + gap.width))

vir1.lm<-lm(data=df_vir, removals.per.flower ~ gyn.width + hood.length + hood.height + slit.length + gap.width)
plot(vir1.lm)
summary(vir1.lm)
#Look at results of model fitting

options(digits=3)
tidy(vir1.lm, conf.int = TRUE)
glance(vir1.lm)

#Get standardized coefficients with lmbeta

lm.beta.vir1 <- lm.beta(vir1.lm)
tidy(lm.beta.vir1, conf.int = TRUE)

#Run through same sequence for the other two life-history traits

scatterplotMatrix(data=df_vir,~ insertions.per.flower + gyn.width + hood.length + hood.height + slit.length + gap.width)

vir2.lm<-lm(data=df_vir, insertions.per.flower ~ gyn.width + hood.length + hood.height + slit.length + gap.width)
plot(vir2.lm)
summary(vir2.lm)

options(digits=3)
tidy(vir2.lm, conf.int = TRUE)
glance(vir2.lm)

lm.beta.vir2 <- lm.beta(vir2.lm)
tidy(lm.beta.vir2, conf.int = TRUE)


scatterplotMatrix(data=df_vir,~ fruits + gyn.width + hood.length + hood.height + slit.length + gap.width)

vir3.lm<-lm(data=df_vir, fruits ~ gyn.width + hood.length + hood.height + slit.length + gap.width)
plot(vir3.lm)
summary(vir3.lm)

options(digits=3)
tidy(vir3.lm, conf.int = TRUE)
glance(vir3.lm)

lm.beta.vir3 <- lm.beta(vir3.lm)
tidy(lm.beta.vir3, conf.int = TRUE)
```

> Hood height effect detected for removals per flower, but model rsq\<1%, and standardized estimate not particularly large Slit length detected for insertions per flower, but rsq again very low, 1.6%, and standardized estimate low Fruits also with poor model fit, rsq \<5%; highly "significant" effect of hood length detected. Standardized estimates low, 0.24 - strongest for this species.

*What do you conclude about the role of floral traits in these two species?*

> Different floral traits matter - the best predictor of each fitness component is different between the two species.

*Is there anything you'd be cautious about in making this comparison?*

> Different detection abilities for the two species, because one is commoner than the other, and easier to get a large sample.

> Watch out for students using statistical signficance only to guide conclusions

> Different sensitivies probably mean that two of the 3 effects for A. viridiflora wouldn't have been detected with the A. syriaca sample size

> Standardized coefficients suggest weaker overall effects in A. viridiflora

### C

Recall the sengi example from Chapter 5 (or go back and look at it ;-)). The research questions are about family differences. Could also look at this relationship between sengis and the rest. There are 2 or 3 groups of insectivores, sengis and closely (afrotherian) and more distantly (laurasiatherian) species, and the research question is about where sengis fit. We can frame this as 2 or 3 questions.

1.  Does the new species (*udzugwensis*) fit within the pattern of other sengi?

2.  Are sengi different from the other small insectivores in their brain size?

    1.  sengi vs all others, or

    2.  sengi vs closely-related vs distantly related insectivores

Get started by loading the kaufman data. In Chapter 5, you should have decided that log-transforming both variables was sensible, so lets also start by defining new variables logbrain and log body. That will make the coding tidier, without having to log things repeatedly.

```{r}
df <- read.csv("data/kaufman.csv")
df$logbrain <- log(df$brainmass)
df$logbody <- log(df$bodymass)
```

For the first question, cast your mind back to Chapter 6. How would you decide whether the new species is unusual?

> Use the existing species to calculate a linear regression, then calculate the predicted value for a species of the body mass of *R. udzugwensis*.

```{r}
sengi <- filter(df, family == "Macroscelididae")
sengi
sengi.lm <- lm(data=filter(sengi, species != "udzugwensis"), logbrain ~ logbody)
glance(sengi.lm)
tidy(sengi.lm)

# Now predict values for *R. udzu..*

predict(sengi.lm, data.frame(logbody = c(6.565)), interval="prediction", se=T)

```

> Predicted log brain mass is 8.91, and the prediction interval is 8.50 - 9.31, so this species is pretty much smack on the line.

#### Now lets compare sengis to the other insectivores. Use three groups for comparison (sengi, Afrotherian and Laurasiatherian). These groups are defined in the variable *relation*

\*\* You could make this comparison in two ways:

-   fit a linear model including the groups and log body mass, i.e. an analysis of covariance
-   look at the patterns in the residuals for the relationship between log-brain and log-body

\*\*Before you start, are there any things to check in the original data, linked to the assumptions of the linear model you'll fit?

> The other important thing to note is that if we're looking to compare sengis and other groups formally, we need to be careful about the ranges of body sizes. Sengi species start around the middle of the range, from 45g up. We'd probably restrict our comparison to species around this size - exclude those with body mass less than a threshold. Threshold would be arbitrary, but most people would choose 40 or 45g

```{r}
ggplot(data=df, aes(x=log(bodymass), y=log(brainmass), colour = relation, shape = relation))+
  geom_point()+ theme_light() + scale_color_uchicago()
```

##### Analysis of covariance

**Outline the steps you'll take**

> 1.  Fit complex model allowing slopes to vary among groups
> 2.  Assess whether the groups\*logbody term should remain in the model
> 3.  If relationships can be treated as parallel, run simpler model with groups and logbody
> 4.  Examine results and decide whether to do any comparisons among groups

```{r}
df$relation<-factor(df$relation)
df$relation2<-factor(df$relation2)
contrasts = list (relation = contr.sum)
# Create subset of file. We could keep running the filter each time we run the model, but doing it once is tidier.
df2 <- filter(df, bodymass > 40)
lm2<-aov(logbrain~logbody + relation + logbody*relation, data = df2)
glance(lm2)
tidy(lm2)
Anova(lm2, type='III')
```

> logbody\*relation term contributing little to model fit, so run simpler model

```{r}
lm3<-aov(logbrain~logbody + relation, data = df2)
glance(lm3)
tidy(lm3)
Anova(lm3, type='III')

#Get adjusted means
library(effects)
adjmeans <- effect("relation", lm3, se=TRUE)
summary(adjmeans)
ggplot(data=df2, aes(x=log(bodymass), y=log(brainmass), group= relation, colour = relation, shape = relation))+
  geom_point()+ geom_smooth(method=lm) + theme_light() + scale_color_uchicago()
```

> No need to proceed further. We conclude that slopes differ between the groups, and sengi are clearly above the other two. The other two groups are close, with abutting confidence intervals around the adjusted means.

##### Use residuals from a regression of all data and compare residuals between groups

```{r}
lm4 <- lm(log(brainmass) ~ log(bodymass), data=df2)
df3 <- cbind(df2, lm4$residuals)
head(df3,10)
boxplot(lm4$residuals ~ relation, data = df3) 
lm5 <- aov (lm4$residuals ~ relation, data = df3)
summary(lm5)
```

> Pattern also clear here, just from box plots

### D

We'll return to the elephant seal example in the study by LaBoeuf et al., and see whether body weight plays any role in foraging. In Chapter 5, you should have noticed that while the focus of the initial analysis was on the relationship between distance travelled and time spend on the foraging grounds, the authors recorded weight on departure for each animal. Your exploratory data analysis should have shown a relationship between body weight and the original predictor and response variables. Now try and make some sense of what's going on here.

```{r}
#Get the data file back
df <- read.csv("data/leboeuf.csv")
head(df,10)
```

**Think about how body mass might influence distance travelled and how it might contribute to time on foraging areas**

>Mass could easily be linked to swimming speed, allowing larger animals to forage for longer. We could even get more complicated and speculate whether larger males may spend more time maintaining dominance on the shore, so might forage less. In either case, the two variables could be linked.

>Body mass might also reflect overall condition, and, for example, males in poorer condidtion might need to spend longer feeding.

**How will you assess whether including body weight as a second predictor helps us understand feeding time better?**

-   Write out the linear model you'd apply

> Time on foraging = *β~0~* + *β~1~* Distance travelled + *β~2~* Body weight

-   What checks do you need when fitting the model?

> Need to check collinearity - use VIF

> Check residuals

> Check linearity

```{r}
cor(df[,c('distance','departwt')])
laboeuf1.lm <- lm (FFAduration~ distance + departwt, data = df)
vif(laboeuf1.lm)
plot(laboeuf1.lm)
```

> VIF value not small, but below the common threshold of 5

> Nothing dramatic in residuals, though points 13, 14, and 16 have large residuals

```{r}
augment(laboeuf1.lm)
```


**Fit the appropriate model to the data, interpret the results, and explain whether body weight helps us.**

>Get the equation

```{r echo=FALSE, results='asis'}
equatiomatic::extract_eq(laboeuf1.lm)
```


```{r}
glance(laboeuf1.lm)
tidy(laboeuf1.lm)
```

> Get standarised regression coefficients

> Use lm.beta function from lm.beta package

```{r }
lm.beta.laboeuf <- lm.beta(laboeuf1.lm)
tidy(lm.beta.laboeuf, conf.int = TRUE)
glance(lm.beta.laboeuf)
```

> Show the partial regression (added-variable) plots

```{r }
avPlots(laboeuf1.lm, ask=F)
```

> Conclusion is that there's a strong statistical relationship. The model explains around 60% of variation, and that effect is largely associated with distance travelled. Time on foraging grounds falls sharply with distance.

> Departure rate plays little role.

**Is there anything else you might look at?**

> The relationship of distance to both response and the other predictor might make it worth looking for a non-additive model

```{r}
laboeuf2.lm <- lm (FFAduration~ distance + departwt + distance*departwt, data = df)
vif(laboeuf2.lm)
# High vif values, as expected, so centre predictors and run again
df$cdistance <- scale(df$distance, center=TRUE, scale=FALSE)
df$cdepartwt <- scale(df$departwt, center=TRUE, scale=FALSE)
laboeuf3.lm <- lm (FFAduration~ cdistance + cdepartwt + cdistance*cdepartwt, data = df)
vif(laboeuf3.lm)
lm.beta.laboeuf3 <- lm.beta(laboeuf3.lm)
tidy(lm.beta.laboeuf3, conf.int = TRUE)
glance(lm.beta.laboeuf3)

```

>Actually a worse model; more issues with VIF, adjusted r-sq lower, and combined term contributes nothing